Processing Quantities with Heavy-Tailed Distribution of Measurement Uncertainty: How to Estimate the Tails of the Results of Data Processing

نویسندگان

  • Michal Holcapek
  • Vladik Kreinovich
چکیده

Measurements are never absolutely accurate; so, it is important to estimate how the measurement uncertainty affects the result of data processing. Traditionally, this problem is solved under the assumption that the probability distributions of measurement errors are normal – or at least are concentrated, with high certainty, on a reasonably small interval. In practice, the distribution of measurement errors is sometimes heavy-tailed, when very large values have a reasonable probability. In this paper, we analyze the corresponding problem of estimating the tail of the result of data processing in such situations. 1 Formulation of the Problem Need for data processing. In many practical situations, we are interested in the values of a quantity y which is not easy (or even impossible) to measure directly: for example, we may be interested in tomorrow’s weather, in the distance to a faraway planet, in the amount of oil in an oil well, etc. In such situations in which we cannot measure y directly, we can often measure y indirectly, i.e.: – measure the values of auxiliary quantities x1, . . . , xn which are related to the desired quantity y by a known relation y = f(x1, . . . , xn), and then – use the results x̃1, . . . , x̃n of measuring the quantities xi and the known dependence to compute the estimate ỹ = f(x̃1, . . . , x̃n) for y. The process of computing ỹ = f(x̃1, . . . , x̃n) is known as data processing. Need to estimating uncertainty of the result of data processing. Measurements are never 100% accurate; so, in general, the measurement results x̃i are somewhat different from the actual values xi of the corresponding quantities. Because of these measurement errors, the estimate ỹ = f(x̃1, . . . , x̃n) is, in general, different from the desired value y = f(x1, . . . , xn) (often, there is an additional difference cause by the fact that the dependence between y and xi is only approximately known). It is therefore important not just to generate an estimate ỹ, but also to gauge how much the actual value y can differ from this estimate, i.e., what is the uncertainty of the result of data processing; see, e.g., [7]. Estimating uncertainty of the result of data processing: traditional statistical approach. Usually, there are many different (and independent) factors which contribute to the measurement error. In many such situations, it is possible to apply the Central Limit Theorem (see, e.g., [9]), according to which, under reasonable conditions, the distribution of the joint effect of numerous independent factors is close to normal. In such situations, it is therefore reasonable to assume that all the measurement errors ∆xi def = x̃i − xi are independent and normally distributed. To describe a normal distribution, it is sufficient to know the mean μ and the standard deviation σ. Thus, under the normality assumption, to gauge the distribution of each measurement error ∆xi, we must know the mean μi and the standard deviation σi of this measurement error. If the known mean is different from 0, this means that this measuring instrument has a bias; we can always compensate for this bias by subtracting the value μi from all the measured values. After this subtraction, the mean error will become 0. Thus, without losing generality, we can assume that each measurement error is normally distributed with mean 0 and known standard deviation σi. The traditional way of estimating the resulting uncertainty ∆y def = ỹ− y in y is based on this assumption. Specifically, since the measurement errors ∆xi are usually relatively small, we can expand the expression ∆y = ỹ − y = f(x̃1, . . . , x̃n)− f(x1, . . . , xn) = f(x̃1, . . . , x̃n)− f(x̃1 −∆x1, . . . , x̃n −∆xn) in Taylor series in ∆xi, ignore quadratic and higher order terms, and keep only terms in ∆xi in this dependence. As a result, we get an expression

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Hurst exponent estimation under heavy-tailed distributions

In this paper, we show how the sampling properties of Hurst exponent methods of estimation change with the presence of heavy tails. We run extensive Monte Carlo simulations to find out how rescaled range analysis (R/S), multifractal detrended fluctuation analysis (MF − DFA), detrending moving average (DMA) and generalized Hurst exponent approach (GHE) estimate Hurst exponent on independent seri...

متن کامل

Investigations of the Material Composition of Iron-containing Tails of the Enrichment of the Mining and Processing Combines of the Kursk Magnetic Anomaly of Russia

The inevitable depletion of mineral resources, the constant deterioration of the geological and mining conditions for the development of mineral deposits and the restoration of raw materials from mining waste by recycling are all urgent problems we face today. The solution to this problem may ensure: a considerable extension of raw material source; decrease of investments in opening new deposit...

متن کامل

Joint Bayesian Stochastic Inversion of Well Logs and Seismic Data for Volumetric Uncertainty Analysis

Here in, an application of a new seismic inversion algorithm in one of Iran’s oilfields is described. Stochastic (geostatistical) seismic inversion, as a complementary method to deterministic inversion, is perceived as contribution combination of geostatistics and seismic inversion algorithm. This method integrates information from different data sources with different scales, as prior informat...

متن کامل

Statistical Wavelet-based Image Denoising using Scale Mixture of Normal Distributions with Adaptive Parameter Estimation

Removing noise from images is a challenging problem in digital image processing. This paper presents an image denoising method based on a maximum a posteriori (MAP) density function estimator, which is implemented in the wavelet domain because of its energy compaction property. The performance of the MAP estimator depends on the proposed model for noise-free wavelet coefficients. Thus in the wa...

متن کامل

Tail-scope: Using friends to estimate heavy tails of degree distributions in large-scale complex networks

Many complex networks in natural and social phenomena have often been characterized by heavy-tailed degree distributions. However, due to rapidly growing size of network data and concerns on privacy issues about using these data, it becomes more difficult to analyze complete data sets. Thus, it is crucial to devise effective and efficient estimation methods for heavy tails of degree distributio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013